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For the study of information propagation, one fundamental problem is uncovering universal laws governing 
the dynamics of information propagation. This problem, from the microscopic perspective, is formulated as 
estimating the propagation probabUity that a piece of information propagates from one individual to 
another. Such a propagation probabUity generally depends on two major classes of factors: the intrinsic 
attractiveness of information and the interactions between individuals. Despite the fact that the temporal 
effect of attractiveness is widely studied, temporal laws underlying individual interactions remain unclear, 
causing inaccurate prediction of information propagation on evolving social networks. In this report, we 
empirically study the dynamics of information propagation, using the dataset from a population-scale social 
media website. We discover a temporal scahng in information propagation: the probability a message 
propagates between two individuals decays with the length of time latency since their latest interaction, 
obeying a power-law rule. Leveraging the scaling law, we further propose a temporal model to estimate 
future propagation probabilities between individuals, reducing the error rate of information propagation 
prediction from 6.7% to 2.6% and improving viral marketing with 9.7% incremental customers. 



In recent years, information propagation on social networks has been attracting much attention from academia 
and industry' Understanding the mechanisms of information propagation, with or without exogenous and 
endogenous factors, is a fundamental task to uncover the universal laws governing the process of information 
propagation, which is important for better explaining the dynamics of information propagation'", predicting 
information popularity", and initiating viral marketing campaign'^"". This task, from the microscopic perspec- 
tive, is formulated as inferring and estimating the propagation probability that a piece of information propagates 
from one individual to another along social links connecting them. 

The difficulty of estimating propagation probability lies in the complex interaction pattern between individuals 
and the co-existence of various confounding factors, such as the interplay between social selection and social 
influence. Previous studies empirically identified two classes of factors that drive information propagation: the 
attractiveness of information and the interactions between individuals. Existing studies on the first class mainly 
discussed three fundamental mechanisms with respect to message attractiveness'^: the time-invariant intrinsic 
attractiveness or fitness'"", the Matthew effect in the popularity accumulation'^, and the freshness of messages 
decaying in a power-law^°, exponentiaP''^^, Rayleigh^'''^'', or log-normal'^ manner with respect to the time span 
since the message is posted^^. In contrast, most conventional studies on the second class were limited to static or 
quasi-static scenarios, assuming time-invariant interactions between any pair of individuals. Researchers esti- 
mated a propagation probability by indifferently aggregating recent and long-ago interactions^''^'', or by learning a 
probability function with static features including structural characteristics of the underlying network" 
demographic features''", and topical and contextual features'" Few studies explored the possibility that indi- 
vidual interactions change with time. A recent study modeled social influence as a Markovian chain on temporally 
sliced snapshots of a social network, but did not reveal the intrinsic temporal scaling how social influence 
evolved^"*. 

Actually, most real-world social networks are far from static. On evolving social networks, whether a piece of 
information will be propagated is more related to instant frequency of individual interactions rather than 
average frequency indifferently aggregated over recent and long-ago interactions. Hence, it is problematic to 
neglect the dynamic nature of individual interactions and its crucial role at information propagation, leading to 
inaccurate predictions. A possible solution is working only on recent interactions based on temporally sliced 
snapshots of interactions. However, it is hard to determine the appropriate temporal scale of snapshots since the 



SCIENTIFIC REPORTS | 4:5334 | DOI: 10.1038/srep05334 



1 



frequency of interactions is scale-free'^. Therefore, we lack a full 
understanding about the temporal scaling of information pro- 
pagation, which is crucial to grasp the propagation dynamics of 
information. 

In this report, we study whether and how individual interactions 
vary temporally and their role at predicting the instant propagation 
probability. Intuitively, a high frequency of recent communication 
implies strong instant interaction and a high propagation probabil- 
ity. As the delegate of recency, latency is defined as the idle time since 
the latest communication between two individuals. A long latency 
generally reflects a low tendency of future interaction. Thus analyz- 
ing the interdependence between the latency and the trend of a 
propagation probability provides us a peculiar delegate for under- 
standing the temporal effect of information propagation. With this 
delegate, we study on a population-scale social media dataset and 
conduct an empirical validation for the intuition that a longer latency 
indicates a relatively lower instant propagation probability. 

To focus on analyzing the temporal scaling of propagation prob- 
abilities from the perspective of individual interactions, in this report 
we do not consider the factors of information attractiveness, and 
instead calculate a propagation probability between two individuals 
as the ratio of retweeted and neglected messages that are propagated 
from one to another. This methodology is reasonable when the num- 
ber of messages is sufficient to largely average out information attrac- 
tiveness. In this way the temporal scaling of information propagation 
fully reflects the temporal scaling of individual interactions. 

Results 

The studies are based on a publicly available dataset (WISE 2012 
Challenge, http://www.wise2012.cs.ucy.ac.cy/challenge.html) col- 
lected from Sina Weibo, the largest Chinese micro blogging website, 
like Twitter. In the dataset with some simple preprocessing (see 
Section SI), half a million users created 1.2 million following rela- 
tions among them, providing channels for propagation of 8 million 
messages. We denote with an edge (v,-, vj) the relation that a user Vj 
(called ^he. follower) follows another user v, (called thefollowee). Each 
time Vj sees a message k posted or retweeted by v, that Vj has not 
retweeted before, we say = 1 if retweets k, forming a positive 
example indicating v,- successfully activates Vj to retweet fc; otherwise 
^i,j.k — 0 for a negative example if Vj neglects k. For each positive/ 
negative example, we measure the latency Zjjj^ as the time span since 
the latest time Vj retweets a message from v,. 

We start to explore the temporal scaling of information propaga- 
tion by examining time stamps of positive examples on two ran- 
domly selected edges, a folio wee and two of his followers. Figure la 
and Figure lb reveal a non-uniform density of positive examples that 
the followers frequently retweet messages from the foUowee in sev- 
eral short time periods, separated by long idle periods. This implies a 
burst phenomenon on individual interactions: short time frames of 
intense interactions are separated by long idle periods'^. To provide a 
solid evidence for the existence of burst in retweeting behaviors, we 
depict in Figure le the distribution of latency of all positive examples. 
The power-law distribution of latency, reflecting the emergence of 
bursty retweeting behaviors, exhibits the temporal nature of indi- 
vidual interactions. Note that static individual interactions lead to 
a time-invariant propagation probability on each edge in this scen- 
ario, which views retweeting behaviors as a homogeneous Poisson 
process, resulting in an exponential distribution of latency. 

The temporal nature of individual interactions results in a neces- 
sity to assign a unique propagation probability to every retweeting/ 
neglecting behavior even occurred on the same edge, reflecting the 
instant tendency that a follower retweets a foUowee's message at the 
time that message arrives. To uncover the temporal scaling of instant 
propagation probabilities, we investigate the interdependence be- 
tween the propagation probability behind every retweeting/neglect- 
ing behavior and the latency associated with it. The interdependence 



is suggested by the distribution of retweeting/neglecting behaviors on 
those two edges against associated latency, where most retweeting 
behaviors occur with short latency (Figure Ic and Id). We calculate 
the ratio of retweeting and neglecting behaviors over all edges to 
estimate the invisible instant propagation probability given certain 
latency. The propagation probability decreases with the latency in a 
power-law manner (Figure If). Fitting the log-log curve in Figure If 
produces a consistently decaying speed of —0.71 slope, suggesting the 
temporal scaling between a propagation probability Pr{5 =1) behind 
a retweeting/neglecting behavior and its associated latency z as follows, 

Pr(<5„,, = l)crT,^,r- (1) 

We further study whether retweeting behaviors on different edges 
share the same power exponent, governing the temporal scaling. As 
shown in Figure la-d, although the retweeting behaviors on the two 
edges both obey the power-law temporal scaling, the power exponents 
are quite different. Therefore, we need to assign an edge-specific expo- 
nent on each edge, in order to model the temporal scaling of informa- 
tion propagation on various edges of social networks. 

Motivated by the observed temporal scaling, we propose a tem- 
poral model, namely Decay model, to predict propagation probabil- 
ity. We evaluate the performance of the model by applying it to 
predict retweeting behaviors and to launch a viral marketing strategy, 
compared with four mainstream baselines, namely MLE, EM^'', Static 
Bernoulli^', and Static PC Bernoulli^'. 

The first evaluation experiment measures the probability a model 
correctly predicts whether or not an individual will retweet an 
incoming message. Figure 2a reports AUC, the area under the 
Receiver Operating Characteristic (ROC) curve, equivalent to the 
probability that a classifier correctly distinguishes a positive example 
from a negative one. The Decay model outperforms all baselines, 
raising AUC from 93.3% to 97.4%. Intuitively speaking, when facing 
a randomly selected pair of a retweeting behavior and a neglecting 
behavior, the error rate to incorrectly distinguish them is reduced by 
a half by the Decay model over the best baseline. We then report the 
perplexity on the testing set against the training set ratio to obtain the 
probability that a model, trained with incomplete observations, cor- 
rectly generates the testing examples. As shown in Figure 2b, the 
Decay model achieves the lowest (best) perplexity among all tested 
models. The priority of the Decay model is consistent in all examined 
training set ratios, with a more significant improvement on a rela- 
tively smaller training set. We also evaluate the Decay model with 
ROC curve, which is a metric appropriate for extremely imbalanced 
datasets such as the one we use in this report (as well as most real- 
world social media) where positive examples occupy less than 1%. 
ROC, measuring the sensitivity (true positive rate) against specificity 
(one minus false positive rate), is insensitive to the ratio between 
positive and negative examples. Figure 2c reports the ROC curves 
of the Decay model and baselines with 90% examples held out as the 
training set. Results of other training set ratios are similar. The figure 
shows that the Decay model achieves the best capability at distin- 
guishing retweeting behaviors from neglecting behaviors with a sig- 
nificant improvement upon all baselines. 

The second evaluation measures the accuracy a model predicts 
propagation probabilities. Intuitively, predictions that are more accur- 
ate would help select a better initial seed set, triggering a larger fraction 
of individuals. We split all examples into 4 groups in a chronological 
order with respect to example time stamps. Each group contains 
examples in 30 weeks (see Section S6 for details). The Decay model 
and baselines train on examples in the earlier 205 days (training 
phase) and predict the propagation probabilities in the last 5 days 
(evaluation phase). Based on those predictions, a state-of-the-art influ- 
ence maximization algorithm (CELF-I- -1-'^) is used to select an initial 
seed set maximizing the expected eventual influence spread. We then 
estimate the pseudo actual spread of such a seed set as the number of 
nodes reachable from the seed set on a propagation network, which is 
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Figure 1 | Characterizing propagation probabilities. (a,b) Time stamps of positive examples (retweeting behaviors) on two random edges. Each vertical 
line represents a retweeting behaviors occurring with the time stamp marked on the horizontal axis. (c,d) Positive (retweeting) and negative (neglecting) 
examples on those two edges. Vertical hnes in upper half represent positive examples, while those in lower half represent negative ones. It shows an 
obvious tendency that most positive examples are concentrated on the left zone, i.e., most retweeting behaviors occur with short latency. The tendency is 
stronger on (c) than that on (d). (e) Distribution of latency of retweeting behaviors over all edges, (f) Ratio of positive examples upon all examples on all 
edges with respect to the associated latency, demonstrating the power-law interdependence between the propagation probability and the latency. 



a subgraph of the social network consisting of edges with at least one 
actual retweeting behavior in the last 5 days. As reported in Figure 2d 
(one group shown only), the largest pseudo actual spread comes from 
the seed set selected on propagation probabilities predicted by the 
Decay model, which eventually reaches 2,590 nodes, achieving a 
9.7% increase upon what is reached by the best baseline, i.e.. Static 
PC Bernoulli which reaches 2,361 nodes. The increase in pseudo 
actual spread demonstrates the advantage that the Decay model more 
accurately predicts the propagation probabilities, confirming our find- 
ing that individual interactions decay with latency. 

Discussion 

In this report, we uncovered the temporal scaling in information 
propagation from the perspective of individual interactions: a 



propagation probability decays slowly in a power-law manner with 
the latency since their latest interaction. Such a dynamic nature was 
demonstrated by empirical studies on a large-scale public social 
media dataset, showing the power-law interdependence between a 
propagation probability and latency. 

With the observed temporal scaling, a Decay model was proposed 
to predict future propagation probability among individuals, incorp- 
orating a time-invariant base probability and a time-decaying expo- 
nent on each edge. The model is applicable in scenarios where an 
underlying social network and tractable information propagation 
with time stamps are observed, such as micro blogging (Twitter 
and Sina Weibo), blog sites, book sharing sites and email promotion 
networks. Empirical evaluations supported that the Decay model out- 
performed mainstream baselines in predicting retweeting behaviors. 
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Figure 2 | Model evaluation, (a) AUG of the Decay model and baselines. AUG measures the area under the ROG curves, and thus is equivalent to the 
probability that a trained model correctly distinguish a randomly selected positive example from another randomly selected negative example, (b) Perplexity 
of the Decay model and baselines when predicting retweeting behaviors, against the training set ratio. A lower perplexity indicates a better prediction 
accuracy, meaning less extent a testing example surprises a trained model, (c) Receiver Operating Gharacteristic (ROC) curves with a training set of 90% 
examples, (d) Influence spreads of an initial seed set selected on propagation probabilities predicted by the Decay model and baselines. 



significantly reducing by a half the expected error rate of incorrectly 
identifying a retweeting behavior. 

From the perspective of machine learning, the discovered tem- 
poral scaling provides an additional feature to estimate propagation 
probability. While traditional models assume static propagation 
probability, the proposed Decay model additionally explores the 
temporal effect of a propagation probability, explaining the increased 
accuracy. Generally speaking, a model with more features requiring 
more data for training suffers severe over-fitting problem on sparse 
data. This partly explains why traditional models do not consider 
temporal features. In order to reduce the pain of sparsity, the Decay 
model introduces a prior distribution of the decaying exponent p(o!), 
suggested by the global decaying exponent in empirical study results. 
The prior distribution successfully reduces the pain of sparsity: the 
improvement of the Decay model upon baselines is even more sig- 
nificant with a relatively smaller training set (Figure 2a and 2b). Note 
that typically only several retweeting behaviors are observed on an 
edge in a real-world scenario, the outstanding performance of the 
Decay model on sparse data is of great importance in practice. 

It is worth noting that the viral marketing evaluation is not con- 
ducted using Monte Carlo simulations, as done in most influence 
maximization studies. That is because what we compare is the 



configurations of propagation probabilities estimated with various 
model, and thus it is unfair to run Monte Carlo simulations with any 
estimated configuration, otherwise estimating all probabilities equal 
to one will surely win. Instead, we estimate the propagation spread in 
a pseudo-actual way. We build a propagation network, a subgraph of 
the social network, with edges where at least one retweeting behavior 
occurs in the 5-day evaluation phase. Therefore the reachability of a 
node on the propagation network measures its pseudo actual influ- 
ence spread during that 5 days. It is equivalent to one Monte Carlo 
simulation that is produced from the (unknown) actual individuals 
and observed by actual retweeting behaviors. The estimated propaga- 
tion spread is deterministic without any random deviation. 

In the Decay model, the base probability q is considered as a free 
variable whose value is fully determined by maximum-a-posteriori 
inference with a prior distribution. In fact, the Decay model can 
certainly incorporate any endogenous or exogenous factors through 
rewriting q as a function of those factors, such as demographical, 
structural, content and context features. Parameters of such a func- 
tion could also be estimated in maximum-a-posteriori inference. 

In the first evaluation experiment, the Decay model is tested with 
only one testing example on each edge, for the ease of calculating 
latency. When facing multiple testing examples (e.g., predicting 
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whether an individual will retweet a series of messages in a month), 
one should predict those examples one by one in a chronological 
order and calculate the expected latency of a later example over the 
joint probability distribution of predicted results of all previous test- 
ing examples. 

Choosing the latency as a delegate of recency is equivalent to 
approximating the information propagation occurrences as a first 
order Markov process, i.e., only the idle time since the latest inter- 
action, instead of all historical interactions, affects the current 
decision. Such an approximation, effectively avoiding expensive cal- 
culation with an nondeterministic number of parameters required to 
build a complicated function defined on all historical interactions, 
succeeds in revealing strong evidence of interdependence between 
propagation probabilities and latency and in building an outper- 
forming prediction model. That supports the important role that 
the temporal scaling plays in characterizing a propagation 
probability. 

As an open question in future, it would be attractive to character- 
izing influential nodes identified with high propagation probabilities 
estimated by the Decay model, and to demonstrate the evolving 
distribution of instant influential nodes on a social network. 

Methods 

The proposed Decay model describes the propagation probability P{()i,jx ~ 1 )> that an 
individual v, will successfully activate another individual Vj to retweet a message fc, 
which is believed to be determined by two factors: 

• qij e [0, 1]: the base probability associated with the edge (Vj, v,); 

• T,y/- e [1, -t-^): latency, the time span since the latest time Vj activated Vj, i.e., 
^iV,/c — ffc,( ~ ffc',;, where is the time stamp when v, posts or retweets k, and k' is 
the latest message before k that Vj activates v, to retweet. 

Specifically, the propagation probability is as follows, 

^'('5^J,.=l)=*^Ty;^ (2) 

where a,y > 0 is a decaying exponent associated with the edge (v„ Vy). The decaying 
exponent is edge- specific, with a prior distribution p(a) reflecting the global decaying 
exponent. Traditional models without temporal scaling of propagation probabilities 
can be viewed as special cases of the Decay model with constant a ~ 0. 

Latency is required to be bounded, i.e., t 1, to guarantee PiSj^j^k = 1) ^ [0, 1]. 
Specifically, t^j^ = 1 results in that cij^j'^ . .'^^^ = qij, revealing the intuitive meaning of 
the base probability that equals to the probability v, successfully activate Vj to 
retweet a message k which arrives immediately after a previous successful activation. 

The hidden parameters q and a are inferred with a maximum- a- posteriori estimate 
with prior distributions and p{y.). See Section S3 for details. 

To demonstrate the performance of the Decay model, four mainstream baselines 
are implemented to estimate and predict propagation probabilities on all edges, 
including MLE, EM^^ Static Bernoulli^', and Static PC Bernoulli^^ (see Section S4). 
Some other widely used models are not compared because those models require user 
profiles or message content that are absent in this scenario. 

In the retweeting prediction experiment, we apply a next-one strategy to split a 
training set and a testing set. On each edge, we sort all examples in a chronological 
order, take the earliest N% examples as the training set, and leave the next one 
example as the testing set. Thus the size of the training set increases with iV%, the 
training set ratio, whUe the size of the testing set is a constant equal to the number of 
edges. With parameters trained on the training set, the Decay model predicts the label 
3 of examples in the testing set. 

The evaluation metrics include perplexity, ROC curve and AUC. The perplexity 
measures how the testing examples surprise a trained model. A lower perplexity 
demonstrates better prediction ability. 

perplexity ^ e ^\ . ( 3 ) 

where Dt^st represents the testing set, and P[Si^j^k — l) is the estimated propagation 
probability. The Receiver Operating Characteristic (ROC) curve plots sensitivity (true 
positive rate) against specificity (one minus false positive rate). AUC measures the 
area under the Receiver Operating Characteristic curve, which is equivalent to the 
probability that a model correctly distinguishes a randomly selected positive example 
from a randomly selected negative example. A higher AUC indicates a better dis- 
tinguish ability. See Section S5 for details. 
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